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Abstract 

ly-^ i We present the first sample-optimal sublinear time algorithms for the sparse Discrete Fourier Transform over 

a two-dimensional y/n x grid. Our algorithms are analyzed for average case signals. For signals whose 
spectrum is exactly sparse, our algorithms use 0{k) samples and run in 0{k log k) time, where k is the expected 
sparsity of the signal. For signals whose spectrum is approximately sparse, our algorithm uses 0{k log n) samples 
and runs in 0{k log^ n) time; the latter algorithm works for k — 9(y^). The number of samples used by our 
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C/2 ' algorithms matches the known lower bounds for the respective signal models. 



By a known reduction, our algorithms give similar results for the one-dimensional sparse Discrete Fourier 
Transform when n is a power of a small composite number (e.g., n — 6*). 



> 

^ : 1 Introduction 

(N 

The Discrete Fourier Transform (DFT) is a powerful tool used in many domains. Multimedia data sets, 
^T) [ including video and images, are typically processed in the frequency domain to compress the data ||Wal91| 

O ■ IHPN971 IBK95II . Medicine and biology rely on the Fourier transform to analyze the output of a variety 

of tests and experiments including MRI IINislOi NMR IMEH09II and ultrasound imaging MKSOll . Other 
applications include astronomy and radar systems. 

The fastest known algorithm for computing the DFT is the Fast Fourier Transform (FFT). It computes 
^ ■ the DFT of a signal with size n in 0(n log n) time. Although it is not known whether this algorithm is 

, optimal, any general algorithm for computing the exact DFT must take time at least proportional to its 

output size, i.e., Q{n). In many applications, however, most of the Fourier coefficients of a signal are small 
or equal to zero, i.e., the output of the DFT is (approximately) sparse. This sparsity provides the rationale 
underlying compression schemes for image and video signals such as IPEG and MPEG. In fact, all of the 
aforementioned applications involve sparse data. 

For sparse signals, the Q{n) lower bound for the complexity of DFT no longer applies. If a signal has 
a small number k of nonzero Fourier coefficients — the exactly k-sparse case — the output of the Fourier 
transform can be represented succinctly using only k coefficients. Hence, for such signals, one may hope 
for a DFT algorithm whose runtime is sublinear in the signal size n. Even in the more general approximately 
k-sparse case, it is possible in principle to find the large components of its Fourier transform in sublinear 
time. 

The past two decades have witnessed significant advances in sublinear sparse Fourier algorithms. The 
first such algorithm (for the Hadamard transform) appeared in IIKM91I (building on IGL89I ). Since then. 



1 



several sublinear sparse Fourier algorithms for complex inputs have been discovered ||Man921 |GGI"'"02| 
IAGS031 IGMS05: IwelO/AkalO, HI KP12RlHIKP12aLILWC12L[BCG+12llHAKlT2ll . The most efficient of 



those algorithm^ given in IHIKP12all . offers the following performance guarantees: 

• For signals that are exactly /c-sparse, the algorithm runs in 0{k log n) time. 

• For the approximately sparse signals, the algorithm runs in 0{k log n log(n/A;)) time. 

Although the aforementioned algorithms are very efficient, they nevertheless suffer from limitations. 
Perhaps the main limitation is that their sample complexity bounds are equal to the their running times. 
In particular, the sample complexity of the first algorithm (for the exactly fc-sparse case) is @{klogn), 
while the sample complexity of the second algorithm (approximately sparse) is 0(/clog(n) log(n/A;)). The 
first bound is suboptimal by a logarithmic factor, as it is known that one can recover any signal with k 
nonzero Fourier coefficients from 0{k) samples HATOSII . albeit in super-linear time. The second bound 
is a logarithmic factor away from the lower bound of Q.{klog{n/k)) MPWllI established for non-adaptive 
algorithms!!; a shghtly weaker lower bound of log(n//c)/ log log n) applies to adaptive algorithms as 
well IIHIKP12all . In most applications, low sample complexity is at least as important as efficient running 
time, as it implies reduced signal acquisition or communication cost. 

Another limitation of the prior algorithms is that most of them are designed for one-dimensional signals. 
This is unfortunate, since multi-dimensional instances of DFT are often particularly sparse. This situation is 
somewhat alleviated by the fact that the two- dimensional DFT over p x q grids can be reduced to the one- 
dimensional DFT over a signal of length pq IIGMS051IIwel2ll . However, the reduction applies only if p and 
q are relatively prime, which excludes the most typical case of m x m grids where m is a power of 2. The 
only prior algorithm that applies to general m xm grids, due to IIGMS05II . has 0{k log'^ n) sample and time 
complexity for a rather large value of c. If n is a power of 2, a two-dimensional adaptation of the IIHIKP12bll 
algorithm (outlined in the appendix) has roughly 0{k log'^ n) time and sample complexity. 



Our results In this paper, we present the first sample-optimal sublinear time algorithms for the Discrete 
Fourier Transform over a two- dimensional y/n x y/n grid. Unlike the aforementioned results, our algo- 
rithms are analyzed in the average case. Our input distributions are natural. For the exactly sparse case, 
we assume the Bernoulli model: each spectrum coordinate is nonzero with probability k/n, in which case 
the entry assumes an arbitrary value predetermined for that positiord. For the approximately sparse case, 
we assume that the spectrum x of the signal is a sum of two vectors: the signal vector, chosen from the 
Bernoulli distribution, and the noise vector, chosen from the Gaussian distribution (see Section f|2] Prelimi- 
naries for the complete definition). These or similai0 distributions are often used as test cases for empirical 
evaluations of sparse Fourier Transform algorithms DIGS 071 IHIKP12bl ILWC12II or theoretical analysis of 
their performance IILWC121 . 

The algorithms succeed with a constant probability. The notion of success depends on the scenario 
considered. For the exactly sparse case, an algorithm is successful if it recovers the spectrum exactly. For 

' See the discussion in tlie Related Work section. 

^An algorithm is adaptive if it selects the samples based on the values of the previously sampled coordinates. If the positions of 
the samples are chosen in advance of the sampling process, the algorithm is called non-adaptive. All algorithms given in this paper 
are non-adaptive. 

^Note that this model subsumes the scenario where the values of the nonzero coordinates are chosen i.i.d. from some distribution. 

''a popular alternative is to use the hypergeometric distribution over the set of nonzero entries instead of the Bernoulli distribu- 
tion. The advantage of the former is that it yields vectors of sparsity exactly equal to k. In this paper we opted for the Bernoulli 
model since it is simpler to analyze. However, both models are quite similar. In particular, for large enough k, the actual sparsity of 
vectors in the Bernoulli model is sharply concentrated around k. 
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the approximately sparse case, the algorithm is successful if it reports a signal with spectrum z such that 

\\z-x\\l = 0{(T'^n) + \\x\\l/n'' (1) 

where denotes the variance of the normal distributions defining each coordinate of the noise vector, and 
where c is any constant. Note that any fe-sparse approximation to x has error Q{a'^n) with overwhelming 
probability, and that the second term in the bound in Equation [T] is subsumed by the first term as long as the 
signal-to-noise ratio is at most polynomial, i.e., ||x||2 < rP^^^a. See Section ^for further discussion. 

The running time and sample complexity bounds are depicted in the following table. We assume that 
^Jn is a power of 2. 



Input 


Samples 


Time 


Assumptions 


Sparse 


k 


k log k 


k = O(^) 


Sparse 


k 


k log k 








+k{\og logn)'^(i) 




Approx. sparse 


klogn 


k log^ n 


k = e(V^) 



The key feature of our algorithms is that their sample complexity bounds are optimal, at least in the non- 
adaptive case. For the exactly sparse case, the lower bound of is immediate. For the approximately 
sparse case, we note that the ^{k log{n/k)) lower bound of fPWlll holds even if the spectrum is the sum 
of a /c-sparse signal vector in {0, 1, —1}" and Gaussian noise. The latter is essentially a special case of the 
distributions handled by our algorithm, and we give a full reduction in Appendix [A] From the running time 
perspective, our algorithms are slightly faster than those in IIHIKP12all . with the improvement occurring for 
low values of k. 

An additional feature of the first algorithm is its simplicity and therefore its low "big-Oh" overhead. Our 
preliminary experiments on random sparse data indicate that the algorithm for exactly sparse case yields 
substantial improvement over 2D FFTW, a highly efficient implementation of 2D FFT. In particular, for 
n = 2^2 (a 2048 x 2048 signal) and k = 1024, the algorithm is 100 x faster than 2D FFTW. To the best 
of our knowledge, this is the first implementation of a 2D sparse FFT algorithm. For the same n and k, the 
algorithm has a comparable running time (1.5 x faster) to the ID exactly sparse FFT in IIHIKP12ai while 
using 8 X fewer samples. We expect that the algorithm or its variant will be efficient on non-random data 
as well, since the algorithm can randomize the positions of the coefficients using random two-dimensional 
affine transformations (cf. Appendix |B]i. Even though the resulting distribution is not fully random, it has 
been observed that random affine transformations work surprisingly well on real data IIMV08I . 

Our techniques Our first algorithm for A;-sparse signals is based on the following idea. Recall that one 
way to compute the two-dimensional DFT of a signal x is to apply the one-dimensional DFT to each column 
and then to each row. Suppose that k = a^/n for a < 1. In this case, the expected number of nonzero entries 
in each row is less than 1. If every row contained exactly one nonzero entry, then the DFT could be computed 
via the following two step process. In the first step, we select the first two columns of x, denoted by u^^^ 
and u^^\ and compute their DFTs u^^^ and u^^\ Let ji be the index of the unique nonzero entry in the i-th 
row of X, and let a be its value. Observe that uf'^ = a and u^/'^ = auj^^^ (where a; is a primitive y^-th 
root of unity), as these are the first two entries of the inverse Fourier transform of a 1-sparse signal acj^. 

Thus, in the second step, we can retrieve the value of the nonzero entry, equal to as well as the index ji 
from the phase of the ratio v!:^ /uf^ (this technique was introduced in HHIKP 1 2al ILWC 1 21 and was referred 
to as the "OFDM trick"). The total time is dominated by the cost of the two DFTs of the columns, which 
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(a) Original Spectrum 
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(d) Step 3: Row recovery 



(b) Step 1: Row recovery 
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(e) Step 4: Column recovery 
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(c) Step 2: Column recovery 
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(f) Step 5: Row Recovery 



Figure 1: An illustration of the "peeling" recovery process on an 8 x 8 signal with 15 nonzero frequencies. 
In each step, the algorithm recovers all 1-sparse columns and rows (the recovered entries are depicted in 
red). The process converges after a few steps. 



is 0{^/nlogn). Since the algorithm queries only a constant number of columns, its sample complexity is 

In general, the distribution of the nonzero entries over the rows can be non-uniform. Thus, our actual 
algorithm alternates the above recovery process between the columns and rows (see Figure [T] for an illus- 
tration). Since the OFDM trick works only on 1-sparse columns/rows, we check the 1-sparsity of each 
column/row by sampling a constant number of additional entries. We then show that, as long as the sparsity 
constant a is small enough, this process recovers all entries in a logarithmic number steps with constant 
probability. The proof uses the fact that the probability of the existence of an "obstructing configuration" 
of nonzero entries which makes the process deadlocked (e.g., see Figure |2]| is upper bounded by a small 
constant. 

The algorithm is extended to the case of k = o{y/n) via a reduction. Specifically, we subsample the 
signal X by the reduction ratio R = a^/n/k for some small enough constant a in each dimension. The 
subsampled signal x' has dimension ^Jm x ^Jm, where ^Jm = ^. Since subsampling in time domain 
corresponds to "spectrum folding", i.e., adding together all frequencies with indices that are equal modulo 
y/m, the nonzero entries of x are mapped into the entries of x' . It can be seen that, with constant probability, 
the mapping is one-to-one. If this is the case, we can use the earlier algorithm for sparse DFT to compute the 
nonzero frequencies in 0{y/m\ogm) = 0{y/k\ogk) time, using 0{k) samples. We then use the OFDM 
trick to identify the positions of those frequencies in x. 
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(a) (b) 

Figure 2: Examples of obstructing sequences of nonzero entries. None of the remaining rows or columns 
has a sparsity of 1. 

Our second algorithm for the exactly sparse case works for all values of k. The main idea behind it 
is to decode rows/columns with higher sparsity than 1. First, we give a deterministic, worst-case algo- 
rithm for 1-dimensional sparse Fourier transforms that takes 0{k'^ + A;(log log n)*^^^)) time. This algo- 
rithm uses the relationship between sparse recovery and syndrome decoding of Reed-Solomon codes (due 
to IIAT08I ). Although a simple application of the decoder yields 0{v?) decoding time, we show that by 
using appropriate numerical subroutines one can in fact recover a A;-sparse vector from 0{k) samples in 
time 0(A:2 + A;(log log n)^(i) jl In particular, we use Berlekamp-Massey's algorithm for constructing the 
error-locator polynomial and Pan's algorithm for finding its roots. For our fast average-case, 2-dimensional 
sparse Fourier transform algorithm, we fold the spectrum into B = ^^^^^ bins for some large constant C. 
Since the positions of the k nonzero frequencies are random, it follows that each bin receives t = 0(log k) 
frequencies with high probability. We then take 0(t) samples of the time domain signal corresponding to 
each bin, and recover the frequencies con^esponding to those bins in 0{t^ + t(log log n)'^(^)) time per bin, 
for a total time of 0{k\og k + fc(log logn)'^^^)). 

The above approach works as long as the number of nonzero coefficients per column/row are highly 
concentrated. However, this is not the case for k <^ ^Jn log n. We overcome this difficulty by replacing a 
row by a sequence of rows. A technical difficulty is that the process might lead to collisions of coefficients. 
We resolve this issue by using a two level procedure, where the first level returns the syndromes of colliding 
coefficients as opposed to the coefficients themselves; the syndromes are then decoded at the second level. 

Our third algorithm works for approximately sparse data, at sparsity 0(-y/n). Its general outline mimics 
that of the first algorithm. Specifically, it alternates between decoding columns and rows, assuming that they 
are 1-sparse. The decoding subroutine itself is similar to that of IIHIKP12all and uses 0(log n) samples. The 
subroutine first checks whether the decoded entry is large; if not, the spectrum is unlikely to contain any 
large entry, and the subroutine terminates. The algorithm then subtracts the decoded entry from the column 
and checks whether the resulting signal contains no large entries in the spectrum (which would be the case 
if the original spectrum was approximately 1-sparse and the decoding was successful). The check is done 
by sampling O(Iogn) coordinates and checking whether their sum of squares is small. To prove that this 
check works with high probability, we use the fact that a collection of random rows of the Fourier matrix is 
likely to satisfy the Restricted Isometry Property (RIP) of [CT06T. 

A technical difficulty in the analysis of the algorithm is that the noise accumulates in successive itera- 

^'We note that, for k = o(log n), this is the fastest known worst-case algorithm for the exactly sparse DFT. 
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tions. This means that a 1/ log*^^^-* n fraction of the steps of the algorithm will fail. However, we show that 
the dependencies are "local", which means that our analysis still applies to a vast majority of the recovered 
entries. We continue the iterative decoding for log log n steps, which ensures that all but a 1/log*^^^^ n 
fraction of the large frequencies are correctly recovered. To recover the remaining frequencies, we resort to 
algorithms with worst-case guarantees. 

Extensions Our algorithms have natural extensions to dimensions higher than 2. We do not include them 
in this paper as the description and analysis are rather cumbersome. 

While no optimal result is known for the 1-dimensional case, one can achieve optimal sample complex- 
ity and efficient robust recovery in the log n-dimensional (Hadamard) case ( IILev93ll . see also Appendix C.2 
of IIG0I99I ). Our result demonstrates that even two dimensions give enough flexibility for optimal sample 
complexity in the average case. Due to the equivalence between the two-dimensional case and the one- 
dimensional case where n is a product of different prime powers HGMSOSj IIwel21 . our algorithm also gives 
optimal sample complexity bounds for e.g., n = 6* in the average case. 



1.1 Related work 

As described in the introduction, currently the most efficient algorithms for computing the sparse DFT 
are due to IIHIKP12all . For signals that are exactly A:-sparse, the first algorithm runs in 0(A;logn) time. 
For approximately sparse signals, the second algorithm runs in 0(/i; log n log(n//i;)) time. Formally, the 
latter algorithm works for any signal x, and computes an approximation vector x' that satisfies the £2/^2 
approximation guarantee, i.e., ||x — x'||2 < C minfc.gparse y ||S — ylb, where C is some approximation factor 
and the minimization is over A;-sparse signals. Note that this guarantee generalizes that of Equation ([T]). 

We also mention another efficient algorithm, due to I1LWC1 21. designed for the exactly /c-sparse model. 
The average case analysis presented in that paper shows that the algorithm has 0{k) expected sample com- 
plexity and runs in 0{k log k) time. However, the algorithm assumes that the input signal x is specified as a 
function over an interval [0, 1] that can be sampled at arbitrary positions, as opposed to a given discrete se- 
quence of n samples as in our case. Thus, although very efficient, that algorithm does not solve the Discrete 
Fourier Transform problem. 

2 Preliminaries 

This section introduces the notation, assumptions and definitions used in the rest of this paper. 

Notation Throughout the paper we assume that ^/n is a power of 2. We use [m] to denote the set 
{0, ... ,m — 1}, and [m] x [m] = [m]^ to denote the m x m grid ■ i € [m\,j G [m]}. We de- 

fine uj = e~2'^'/v^ to be a primitive ^/n-th root of unity and u' = e"^'^'/" to be a primitive n-th root of 
unity. For any complex number a, we use 0(a) G [0, 27r) to denote the phase of a. For a 2D matrix 
x £ £\/nxVn^ support is denoted by supp(j;) C [y^ x [y/n]. We use ||x||o to denote |supp(a;)|, the 
number of nonzero coordinates of x. Its 2D Fourier spectrum is denoted by x, with 
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Similarly, if y is a frequency-domain signal, its inverse Fourier transform is denoted by y. 



Definitions The paper uses the comb filter used in HlwelOl IHIKP12bl (cf. |Man92|| ). The filter can be 
generalized to 2 dimensions as follows: 

Given (r^, t^) G [\/n\ x [\/n\, and B.,., that divide ^/n, then for all (i, j) G [Bj] x [B^ set 

yi,j = ^j(V"/Br)+r,.,j(v^/-Bc)+rc- 

Then, compute the 2D DFT y of y. Observe that y is a folded version of x: 

Ze[v^/Sr] me[v^/Bc] 

Distributions In the exactly sparse case, we assume a Bernoulli model for the support of x. This means 
that for all G [A/n] x [A/n], 

Pr{(i, j) G supp (x)} = kjn 

and thus E[|supp (x) |] = k. We assume an unknown predefined matrix Oj j of values in C; if j is selected 
to be nonzero, its value is set to ,,. 

In the approximately sparse case, we assume that the signal ir is equal to + {(} G C^^^, where x*i^j 
is the "signal" and w is the "noise". In particular, x* is drawn from the Bernoulli model, where x* j j is drawn 
from {0, Ojj } at random independently for each (i, j) for some values ajj and with E[| supp(x*)|] = k. We 
also require that |ajj | > L for some parameter L. it; is a complex Gaussian vector with variance in both 
the real and imaginary axes independently on each coordinate; we notate this as u; ~ iVc(0,a2/„). We will 
need that L = Ca^Jnjk for a sufficiently large constant C, so that E[||x*||2] > CE[||'u;||2]. 

3 Basic Algorithm for the Exactly Sparse Case 

The algorithm for the noiseless case depends on the sparsity k where k = E[|supp(x)|] for a Bernoulli 
distribution of the support. 

3.1 Basic Exact Algorithm: k = <^{^/n) 

In this section, we focus on the regime k = Q{^/n). Specifically, we will assume that k = a^fn for a 
(sufficiently small) constant a > 0. 

The algorithm BasicExact2DSFFT is described as Algorithm 13.11 The key idea is to fold the spec- 
trum into bins using the comb filter defined in ^and estimate frequencies which are isolated in a bin. The 
algorithm takes the FFT of a row and as a result frequencies in the same columns will get folded into the 
same row bin. It also takes the FFT of a column and consequently frequencies in the same rows wil get 
folded into the same column bin. The algorithm then uses the OFDM trick introduced in liHIKP12al to 
recover the columns and rows whose sparsity is 1. It iterates between the column bins and row bins, sub- 
tracting the recovered frequencies and estimating the remaining columns and rows whose sparsity is 1 . An 
illustration of the algorithm running on an 8 x 8 signal with 15 nonzero frequencies is shown in Fig. [Din 
Section 1. The algorithm also takes a constant number of extra FFTs of columns and rows to check for 
collisions within a bin and avoid errors resulting from estimating bins where the sparsity is greater than 1. 
The algorithm uses three functions: 
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• FoldToBins. This procedure folds the spectrum into x Be bins using the comb filter described ^ 

• BasicEstFreq. Given the FFT of rows or columns, it estimates the frequency in the large bins. If there 
is no collision, i.e. if there is a single nonzero frequency in the bin, it adds this frequency to the result w 
and subtracts its contribution to the row and column bins. 

• BasicExact2DSFFT. This performs the FFT of the rows and columns and then iterates BasicEst- 
Freq between the rows and columns until is recovers x. 

Analysis of BasicExact2DSFFT 

Lemma 3.1. For any constant a > 0, if a > is a sufficiently small constant, then assuming that all 
1-sparsity tests in the procedure BasICEStFreq are correct, the algorithm reports the correct output with 
probability at least 1 — 0{a). 

Proof. The algorithm fails if there is a pair of nonzero entries in a column or row of x that "survives" 
tmax = C log n iterations. For this to happen there must be an "obstructing" sequence of nonzero entries 
Pi,qi,P2,Q2 ■ ■ -Pt, 3 < t < tjnax, such that for each i > 1, pi and qi are in the same column ("vertical 
collision"), while Qi and pi+i are in the same row ("horizontal collision"). Moreover, it must be the case 
that either the sequence "loops around", i.e., pi = pt, or t > tmax- We need to prove that the probability of 
either case is less than a. We focus on the first case; the second one is similar. 

Assume that there is a sequence pi,qi,. . .pt~i,qt-i such that the elements in this sequence are all 
distinct, while pi = pt- If such a sequence exists, we say that the event Et holds. The number of sequences 
satisfying Et is at most y^^^* ^\ while the probability that the entries corresponding to the points in a 
specific sequence are nonzero is at most (A;/n)^(*~^) = {a / y/rif^^^^\ Thus the probabihty of Et is at most 

Therefore, the probability that one of the events Ei, . . . , Et^^^ holds is at most 

which is smaller than a for a small enough. □ 

Lemma 3.2. The probability that any 1-sparsity test invoked by the algorithm is incorrect is at most 
0(l/n(^-5)/2)_ 

To prove Lemma [3^ we first need the following lemma. 

Lemma 3.3. Let y £ C" be drawn from a permutation invariant distribution with r > 2 nonzero values. 
Let T = [2c]. Then the probability that there exists a y' such that \\y'\\o < 1 and {y — y')T = is at most 

\ m—r j 

Proof. Let ^ = be the first 2c rows of the inverse Fourier matrix. Because any 2c x 2c submatrix of A 
is Vandermonde and hence non-singular, the system of linear equations 

Az = h 

has at most one c-sparse solution in z, for any h. 

If r < c — 1, then ||y — y'||o < c so A{y — y') = implies y — y' = 0. But r > 2 so ||y — y'||o > 0. This 
is a contradiction, so if r < c then the probability that (y — y')T = is zero. Henceforth, we assume r > c. 
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procedure FoldToBins(x, B^., B^, Tr, Tc) 






Vi i = /^/R u /^/R w-r for (i, ?) ^ f-B^l X 


[Be], 




fptiii*!! 1/ thp F)PX of ?/ 






end procedure 






procedure BasicEstFreqCu^^) , v^'^\T, IsCol) 






■u; ^ 0. 






Compute J — . z^reT \ ^ ^s- 






for j G J do 












I ^ round(^(o)^) mod vn. 




> (t){h) is the phase of 6. 


s^uf. 








> Test whether the row or column is l-sparse 


jf fErPT I^S-^^ - == o) then 






if IsCol then 




whether decoding column or row 








else 












end if 






for r G T do 






^ 










end for 






end if 






end for 






return w, , '^i^^ 






end procedure 






procedure BASlcExACT2DSFFT(a;, k) 






T \2c] 




> We set c > 6 


for r G T do 






^ FOLDToBins(x, 1, 0, r). 






q^iT) , FoinToRlNSfr 1 a/m t n"! 






end for 






z <(- 






for t G [C log n] do 




> u(^) := {S(^) : T G T} 


{■u;,2*^"^\t;(^)} BASICESTFREQ(u(^\i;(^\ 


T, true). 




z 'z + id. 






{w,v^'^\u^'^^ ^ BASICESTFREQ(?y(^),u(^), 


T, false). 




'z -(^ 'z + w. 






end for 






return z 






end procedure 







Algorithm 3.1: Basic Exact 2D sparse FFT algorithm for k = 0(\/n) 
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When drawing y, first place r — (c — 1) coordinates into u then place the other c — 1 values into v, so 
that y = u + v. Condition on u, so v is a permutation distribution over m — r + c — I coordinates. We know 
there exists at most one c-sparse vector w with Aw = —Au. Then 



Pr[3y':^(y-y') = Oand||y'||o<l] 

y 

= Pr[V : A{v-y') = -^n and ||y'||o < 1] 

V 

< Pv[3y' : V — y' = w and ||y'||o < 1] = Pr[||f — wWo < 1] 

V V 

< Pr[|supp(?;) A supp(it;)| < 1] 

V 

11/ \ c-2 

m — r + c — 1 



where the penultimate inequality follows from considering the cases ||w||o £ {c— 2,c— l,c} separately. □ 
We now proceed with the proof of Lemma [ 



Proof. W.L.O.G. consider the row case. Let y be the jth row of x. Note that Uj = y-r. Observe that 
with probability at least 1 — l/n*^ we have ||y||o < r for r = clogn. Moreover, the distribution of y is 
permutation-invariant, and the test in BasicEstFreq corresponds to checking whether (y — y')T = for 
some l-sparse y' = acj. Hence, Lemma [33] with m = implies the probability that any specific test fails 
is less than c(2c/-y/n)'^^^. Taking a union bound over the ^/n\ogn total tests gives a failure probability of 
4c3logn(2c/V^)^-^ < 0(l/n(^-5)/2)_ □ 

Theorem 3.4. For any constant a, the algorithm BasicExact2DSFFT uses 0{y/n) samples, runs in time 
0{^/nlogn) and returns the correct vector x with probablility at least 1 — 0{a) as long as a is a small 
enough constant. 

Proof. From Lemma |3?T] and Lemma[3]2j the algorithm returns the correct vector x with probability at least 

1 - 0(a) - 0(n-(^-5)/2) ^ 1 _ for ^ > 5_ 

The algorithm uses only 0{T) = 0(1) rows and columns of x, which yields 0{^/n) samples. The 
running time is bounded by the time needed to perform 0(1) FFTs of rows and columns (in FoldToBins) 
procedure, and 0(log n) invocations of BasicEstFreq. Both components take time 0(y^log n). 

□ 



3.2 Reduction to Basic Exact Algorithm: k = o{y/n) 

Algorithm ReduceExact2DSFFT, which is for the case where k = o{y/n), is described in Algo- 
rithm |3]2]l. The key idea is to reduce the problem from the case where k = o{y/n) to the case where 
k = Q{^/n). To do that, we subsample the input time domain signal x by the reduction ratio R = 
a^/n/k for some small enough a. The subsampled signal x' has dimension y/m x -y/m, where ^/m = -. 
This implies that the probability that any coefficient in x' is nonzero is at most x k/n = /k = 
{a? /k) X (k^ jc?)lm = k/m, since m = k'^/a'^. This means that we can use the algorithm BASICNOISE- 
LESS2DSFFT in subsection ^3. II to recover x'. Each of the entries of x' is a frequency in x which was 
folded into x'. We employ the same phase technique used in IIHIKP12all and subsection ^3.1l to recover their 
original frequency position in x. 
The algorithm uses 2 functions: 
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• ReduceToBasicSFFT: This folds the spectrum into 0{k) x 0{k) dimensions and performs the reduc- 
tion to BasicExact2DSFFT. Note that only the 0{k) elements of x' which will be used in BasicEx- 
ACT2DSFFT need to be computed. 

• ReduceExact2DSFFT: This invokes the reduction as well as the phase technique to recover x. 



procedure REDUCEToBASlcSFFT(a;, R, Tr, Tc) 

Define x'^j = XiRj^rr,jR+Tc i> With lazy evaluation 

return BasicExact2DSFFT(x', k) 
end procedure 

procedure REDUCEEXACT2DSFFT(a;, k) 

R ^ for some constant a < 1 such that R\^Jn. 
u(Ofi) ^ REDUCEToBASlcSFFT(a;,i?,0,0) 
REDUCEToBASICSFFT(x,i?, 1,0) 
^^(0,1) ^ REDUCEToBASICSFFT(x,i?,0, 1) 

L ^ supp(?2(°'°)) n supp({t(i'°)) n supp(2(°'i)) 
for {£, m) e Ldo 

i round(0(6r )^) mod ^/n 
j mund{4>{bc)^) mod ^/n 

^ -(0,0) 
Zij ^ u^ ,^ 

end for 

return z 

end procedure 



Algorithm 3.2: Exact 2D sparse FFT algorithm for k = o{-s/ri) 



Analysis of ReduceExact2DSFFT 

Lemma 3.5. For any constant a, for sufficiently small a there is a one-to-one mapping of frequency coeffi- 
cients from X to x' with probability at least 1 — a. 

Proof. The probability that there are at least 2 nonzero coefficients among the R^ coefficients in x that are 
folded together in x', is at most 




{k/nf < {a^n/k^f{k/nf = a^/k^ 



The probability that this event holds for any of the m positions in x' is at most ma^ jk'^ = (ji^ / a^)a^ / k"^ = 
a? which is less than a for small enough a. Thus, with probabiUty at least 1 — a any nonzero coefficient in 
x' comes from only one nonzero coefficient in x. □ 
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Theorem 3.6. For any constant a > 0, there exists a constant c > such that ifk< Cy/n then the algorithm 
ReduceExact2DSFFT uses 0{k) samples, runs in time 0{k log k) and returns the correct vector x with 
probablility at least 1 — a. 

Proof. By Theorem 13.41 and the fact that each coefficient in x' is nonzero with probability 0{l/k), each 
invocation of the function ReduceToBasicSFFT fails with probability at most a. By Lemma [331 with 
probability at least 1 — a, we could recover x correctly if each of the calls to RedToBasicSFFT returns the 
correct result. By the union bound, the algorithm ReduceExact2DSFFT fails with probability at most 

a + 3 X a = 0(a). 

The algorithm uses 0(1) invocations of BasicExact2DSFFT on a signal of size 0{k) x 0{k) in 
addition to 0{k) time to recover the support using the OFDM trick. Noting that calculating the intersection 
L of supports takes 0{k) time, the stated number of samples and running time then follow directly from 
Theorem [331 □ 



4 Algorithm for Exactly Sparse Case of any sparsity k = 0{n) 
4.1 Exact ID Algorithm for k = 0{n) 

We will first present a deterministic algorithm for the one-dimensional exactly sparse case. The algorithm 
IDSFFT, described in Alg. |4.1[ computes the spectrum of a t-sparse signal x G C" and has worst case run- 
ning time of 0{t^ + t{\og log n)^^^^) for t = 0(log n). This deterministic algorithm has the fastest known 
worst case running time for k = o(logn). We will later use it to construct the algorithm EXACTlDSFFT 
that has the fastest known average case running time for k = 0{n). 



procedure lDSFFT(x,t) 

X ^ 

{(/j;^i)}ie[«] ^ SlGNALFROMSYNDROME(n, 




{xo,--- ,X2t-l}). 


Vi for all i € [I]. 


> I < tis result size 


return x 




end procedure 




procedure SlGNALFROMSYNDROME(n, {sq, • • • , 


S2t-l}) 


K{z) <— BerlekampMassey({so, • • • , S2t-i 


})■ 


f ^ PAN(A(z)). 


;> This finds {/ : |x/| > 0} of length / < t 


V ^ VANDERMONDE((a;')-/'o, • • • , {uj')-f'-^) 




V^^ <— InverseVandermonde(F) 








return {(/i,fi)}ieW 




end procedure 





Algorithm 4.1: Algorithm for computing the exact ID FFT of a t-sparse signal of size n 



IDSFFT is a wrapper for the "Signal From Syndrome" procedure SignalFromSyndrome which 
uses the following procedures: 

• BerlekampMassey: This function finds the coefficients of the error locator polynomial A{z) based on 
the input syndromes Sj = ^ ■ Xi{uj'y. The error locator polynomial, which only depends on the locations 
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of the nonzero frequency components of x, is given by: 



Constructing the error locator polynomial is a commonly used step in decoding Reed-Solomon codes IIMS77II 
In our case, the main difference is that the coefficients of A(z) lie in C whereas they lie in a finite field in 
the case of Reed-Solomon codes. By Lemma l4~T] below. the Berlekamp-Massey algorithm [,Mas69il solves 
this problem in time quadratic in t. 

Lemma 4.1 (pMas691). Given the first 2t time-domain samples of a t-frequency sparse signal, Algorithm 
BERLEKAMPMASSEYyznJi the error locator polynomial A(z) (given by Equation^ in time 0{t^). 

Proof. As shown in IIAT08L finding the positions of the nonzero frequencies is equivalent to a gener- 
alization of the Reed-Solomon decoding problem to the complex field; then, solving this complex-field 
Reed-Solomon problem reduces to recovering the lowest-order linear recurrence (A(z)) that generates a 
given sequence of "syndromes" (which equal the Xi in our case) l|Var971 . The running time is 0{t^) where 
i is the degree of the polynomial BMas69ll . □ 

Pan: By the definition of the error locator polynomial (given by Equation lljl, its roots determine the set 
{/i) ■ ■ ■ ) ft] of nonzero frequencies of x. Thus, we can use the Pan root-finding algorithm IIPan021 to find 
the complex roots of A(z). 

Lemma 4.2. ( l[Pan02\l } For a polynomial p{z) of degree t with complex coefficients and whose complex 
roots are located in the unit disk {z : \z\ < 1}, the PAN algorithm approximates all the roots of p{z) 
with an absolute error of at most and using 0{t log^ i(log^ t + log h)) arithmetic operations on b-bit 
numbers, assuming that b >t log t. 

Vandermonde: Given the nonzero frequencies of x, the problem reduces to solving a system of linear 
equations in the values of those frequencies. The coefficient matrix of this system is a Vandermonde 
matrix. Thus, this system has the form: 



V 



Vl 



s(^) where V-j = (w')*^^ (3) 



Vt 

InverseVandermonde: The Vandermonde matrix can be inverted using the optimal algorithm given 



in |Zip90 1 to find the values of the nonzero coordinates of x. 



Lemma 4.3. ( l{Zip90'j ) Given at x t Vandermonde matrix V, its inverse can be computed in time 0{t^). 
Analysis of IDSFFT 

Theorem 4.4. If x is a t-sparse signal of size n where t < O(logn), then on input {xq, ■ ■ ■ ,X2t-i} the 
procedure SigNALFromSyndroME (and hence the algorithm IDSFFT ) computes the spectrum x in time 
0(t2 + t(loglogn)0(i)). 
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Proof. By Lemma l4~n the time needed to construct the error locator polynomial K{z) is 0{t^). By Lemma 
14.21 and since t < logn, running the Pan algorithm with h = log n log log n requires 0(i log^ t(log^ t + 
log log n)) arithmetic operations on (logn log logn) -bit numbers. Since a (log n log log n)-bit arithmetic 
operation can be implemented using 0((log log n)^^^^) log n-bit operations and since we are assuming that 
arithmetic operations on log n bits can be performed in constant time, the total running time of the Pan 
algorithm is 0(t(log log n)'^^^)). Note that since the roots of K{z) are n-th roots (or later y^-roots) of 
unity, the precision of the algorithm is sufficient. Noting that, by Lemma 1431 inverting the Vandermonde 
matrix requires 0{t^) time completes the proof of the Theorem. □ 

Description and Analysis of EXACTlDSFFT The algorithm ExACTlDSFFT(Algorithm gJll com- 
putes the ID spectrum with high probability over a random fc-sparse input for any k. It uses 0{k) samples 
and runs in time 0{k{\ogk + (log log n)'^^^))) . The idea is the following: We fold the spectrum into 
9{k/ log k) bins using the ID version of the comb filter (cf. Section As shown in Lemma 1431 with high 
probability, each of the bins has Oilogk) nonzero frequencies. In this case, SlGNALFROMSYNDROME(n, u^^'') 
will then recover the original spectrum values. The generalization of this algorithm to the 2D case can be 
found in Section 1431 



procedure Exact1DSFFT(x, k) 




X ^ 




B ^ @{k/\ogk) 


> Such that B n 


T ^ \1C log A;] for a sufficiently large constant C . 




for r G T do 




Ui^ ■■= Xi(n/B)+r for i G [B] 




Compute u'-^\ the DFT of it^'^) 




end for 


> uP = {u'f^ : T G T}. 


for j G [B] do 


{{VzJi)}telClogk] ^ SlGNALFROMSYNDROME(n,nJ.'^^) 




xj. ^ Vi for all i G [C log k]. 




end for 




return x 




end procedure 





Algorithm 4.2: Exact ID sparse EFT algorithm for any sparsity k 



Lemma 4.5. Assume that x is distributed according to the Bernoulli model of Section ^ If we fold the 
spectrum into B = 0{k/ logk) bins, then for a sufficiently large constant C, the probability that there is a 
bin with more than Clog k nonzero frequencies is smaller than 0{{l/k)^'^^^°^^). 

Proof. The probability Pr{B, k, m) that there is a bin with more than m nonzero frequencies is bounded by: 

Since B = dk/ log k (for some constant d > 0), m = C log k, we get: 

Pr(ii,fc,mj<i^^^^j -logyfcVcy " ^ V^o.sciogC j 
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□ 



Theorem 4.6. If x is distributed according to the Bernoulli model of Section f|2] then Algorithm Ex- 
ACTlDSFFT runs in time 0{k{\og k + (loglogn)'-^(^))), uses 0{k) samples and returns the correct spec- 
trum x with probability at least 1 - 0((l/A;)°-^'^^°sC)_ 

Proof By Lemma |431 with probability at least 1 - 0((l/A;)°-^*^'°sC), ^^^^ Yiave at most Clog k 

nonzero frequencies each. Then, Theorem 14.4 [ guarantees the success of SlGNALFROMSYNDROME(n, S^^'') 
for every j € [B]. This proves the correctness of EXACT IDSFFT. 

The running time of the for loop over r is 0{k log k). By Theorem l44l SignalFromS YNDROME(n, 2^^'' ) 
takes time 0(log^ A; + log/c(loglogn)'^(^)) for every j G [B]. Thus, the total running time of Ex- 
ACTlDSFFT is 0(A:(logfc + (log log n)'^^^))). For every r G [Clogfc], computing u^"^) requires B = 
A;/ log k samples. Thus, the total number of samples needed is 0{k). □ 

4.2 Exact 2D Algorithm for k = 0{n) 

Here, we generalize the EXACT IDS EFT to the 2D case. For fc/ log k > ^/n, the generalization is straight- 
forward and can be found in Alg. 14.31 EXACT2DSFFT1. For fc/log k < ^/n, the generaliztion requires an 
extra step and can be found in Alg. 133]: EXACT2DSFFT2. 

When k / log k > ^Jn, the desired bucket size njB v& less than ^Jn, so we can have one-dimensional 
buckets and recover the locations with a single application of syndrome decoding. When A;/ log A; < ^Jn, 
we need to have two dimensional buckets to make them large enough. But this means syndrome decoding 
will not uniquely identify the locations, and we will need multiple tests. 



procedure Exact2DSFFT1(j;, k) 




Bi ^ ^/n. 




B2^e{k/{\ogk^)). 




T ^ [2C log A;] for a sufficiently large constant C. 




for r G T do 

Define := 2:^^(^/62)+^ for G [Bi] x [B2]. 






Compute the 2D EFT u^^) of 




end for 




X ^ 0. 


:=\v^ : TGT}. 


for G [Bi] X [^2] do 


{{fl,Vi)}ielClogk] ^ SlGNALFROMSYNDROME(V^,u5y) 




Xij^ <— vi for all / G [Clog A;]. 




end for 




return x 




end procedure 





Algorithm 4.3: Exact 2D sparse EFT algorithm for sparsity A;/ log k > ^Jn 



Description and Analysis of Exact2DSFFT1 The algorithm Exact2DSFFT1 apphes to the case 
where 

A;/ log A; > ^/n. As in the ID case, we use B = Q{k/ \ogk) buckets each of which having 0(nlog A;/A;) 
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frequencies mapping to it. We construct the buckets corresponding to a phase shift of r along the second 
dimension for all r G [2C log k]. As in the ID case, with high probability, each of those buckets will have 
at most C log k nonzero frequencies. The particular choice of the buckets above will ensure that the inputs 
to the SignalFromSyndrome procedure have the appropriate "syndrome" form. 

Theorem 4.7. Ifx is a k-sparse signal (with A;/ log k > yjn) distributed according to the Bernoulli model 
of Section f|2] then Algorithm Exact2DSFFT1 runs in time 0{klogk), uses 0{k) samples and recovers 
the spectrum x of x with probability at least 1 — [l/k^'^'-''^°^'-^y 

Proof. For every (i,j) G [Bi] x [B2] and every r G [2Clogk], u^-^j = ^ Xij^uj^'^-^^. Using 

/2=i mod B2 

the same argument as in Lemma 14.51 with high probability, every bin u has at most C log k nonzero 

frequencies. Noting that the function SlGNALFROMSYNDROME(-y/n, u'y^j^) succeeds whenever uf]^^ are the 
syndromes of a C log /c-sparse signal, implies the correctness of EXACT2DSFFT1. 



Computing 2^^) for all r G T takes time 0{k log A;). Each call to SignalFromS YNDROME(^/n, 



takes time 0(log^ k + log A;(log log n)'-^^^^) by Theorem 14.41 Thus, the overall running time is 0(A;(log k + 
(log log n)*"^^"*^^)) = 0{klogk). For every r G [Clogk], computing u^'^^ requires B\ x B2 = k/logk 
samples. Thus, the total number of samples needed is 0{k). □ 



procedure ExACT2DSFFT2(a;, k) 
B ^ B{k/ log k). 

T {0, 1, • • • , 2C log k — 1} for a sufficiently large constant C. 
for (r, s) G T X [4] do 

Compute the EFT u^'^ of u^''^ where for every i G [B], u'P''^'^'^ = a:^ri,i(v^/B)+r2 
end for 
X ^ 0. 

for i G [B\ do 

for s G [4] do t> uf'" = : r G T}. 

{{fi'\v\'^)}l&[C\ogk] ^ SlGNALFROMSYNDROME(^,ef'') 

end for 

({/o, • • • , /o(logfc)}> {2/0 G[4], le[C\ogk]} 

for / G [0(log k)] do > yf^ := {yf ^ : s e S = [4]}. 

{{90,^0), {gi,wi)} ^ SignalFromSyndrome(^, yp'') 
%!,9i ^ for all j G [2]. 
end for 
end for 
return x 
end procedure 



Algorithm 4.4: Exact 2D sparse EFT algorithm for sparsity k / log k < 



n 



Description and Analysis of Exact2DSFFT2 The algorithm Exact2DSFFT2 above applies to the 
case where k/logk < \/n. As in the ID case, we use B = Q{k/ logk) buckets, each of which having 
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0(nlogA;/A;) frequencies mapping to it (i.e. Q{^/nlogk/k) columns). We construct 4 sets of buckets 
(as opposed to 1 set in the ID case). Those sets correspond to the phase shifts (r, 0), (r, 1), (r, 2) and 
(r, 3) for all r G [2Clogk]. We run the SignalFromSyndrome procedure on each of those 4 sets. 
As opposed to the ID case, the resulting values can be the superposition of 2 or more nonzero frequency 
components. However, as shown in Lemma 14.91 with high probability, all the obtained values correspond 
to the superposition of at most 2 nonzero frequency components. The 4 corresponding superpositions (one 
from each of the 4 sets) are then combined (by the MATCH procedure) to get the union {/o, • • • , /o(iogfc)} 

of the sets {/jf^ , • • • , f^^}^^ ^_ J for all s G [4] along with the associated values {y^^^ , • • • , y^ji^^ (with a 
value if the frequency did not appear for some s). Then, we give the 4 resulting superpositions as inputs to 
the SignalFromSyndrome procedure again. The particular choice of the 4 sets of buckets above ensures 
that those inputs have the appropriate "syndrome" form. The output of this procedure will then consist of 
original spectrum values. 

Lemma 4.8. With probability at least 1-0 (l/A;°-^*^'°§'^), for every i G [B] and s G [4], the output of 
SlGNALFROMSYNDROME(-y/n, u^'*) consists of all nonzero values of the form Xf^f,^uj^^^'^ for 

f2=i mod B 

some fi G [\/n\. 

Proof As in Lemma 1431 we have that the probability that there is a bin with more than |T| /2 = Clog k 
nonzero frequencies is at most O [l/k^'^'-'^°^'-^). Moreover, for every r G T, i G [i?] and s G [4], we have: 



E E ^h,h^'^^'-'^' (4) 

f\<^[^/n] f2=i mod B 

E ( E ^fuf2^''^>-^^' (5) 

h&lV^] f2=imodB 

Noting that the function SlGNALFROMSYNDROME(Y/n, u^''^) succeeds whenever uj^'* are the syndromes 
of a |T| /2-sparse signal, we get the desired statement. □ 

Lemma 4.9. The probability that there are more than 2 nonzero frequency components that superimpose in 
a power of uj (i.e., as in Equation 0, x/1,/2 '^"^ %i J2 ^^P^^^^P^^^^ if fi = /{ <^'^d /2 = /2 mod B) is at 
mostO{!^^). 

Proof. Since ^JnjB = Q{^/n\ogk/k) frequencies map to each power of u in each bucket, the probability 
is upper bounded by 



' \ 2 J \n) - 2nB^ ^ n ' 



□ 



Lemma 4.10. With probability at least 

l-0{k\og^ A;/n- l/A:0-5^'°g^) , 

for all i G [B] the outputs o/SiGNALFromSyndroME(v^, yf ) for all I G [C log k] consist of all nonzero 
%i./2 where /2 = i mod B. 
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Proof. By Lemmas 14.91 and I4.8[ the piobabihty that all bins have at most C log k nonzero frequencies and 
all powers of uj have at most 2 nonzero frequencies is at least 1 — (/clog^ k/n — l/k^-^'~^^°^'~^Y Then for 
every i G [B] and s G 5 = [4], there are at most dog A; nonzero values of the form Xf^j^uj~^f^ 

f2=i mod B 

where fi G [\/n\, and each of those sums consists of at most 2 terms. Thus, yj^ are the syndromes of a 
2-sparse signal of the form x fj^ggUJ'^'^^ + x f^^g-^uj'^^'^^ where r is the time-domain index. This yields the 
desired statement. □ 

Theorem 4.11. Ifx is a k-sparse signal (with A;/ log k < ^/n) distributed according to the Bernoulli model 
of Section ^ then Algorithm Exact2DSFFT2 runs in time 0{k{logk + (loglogn)<^(^))), uses 0{k) 
samples and recovers the spectrum x of x with probability at least 1 — k log^ k/n — O (^l/k^'^'-^^^^^y 

Proof. Lemma |4. 1 Ol implies that Algorithm EXACT2DSFFT2 succeeds with the desired probability. 

Computing u"^'* for all r G T and all s G [4] takes time 0(fclog k). The running time of the MATCH 
procedure is 0(log k). By Theorem 14.41 each call to SignalFromS YNDROME in the for loop over s takes 
time 0(log^ A; + log A;(log log n)^^^^) whereas each one in the for loop over / takes time 0((log logn)*^^^)). 
Thus, the overall running time is 0(A;(log k + (log log n)*^^^^)). 

For every r G [Clog A;], computing u^'^^ requires B = k/logk samples. Thus, the total number of 
samples needed is 0(A;). □ 



5 Algorithm for Robust Recovery 
5.1 Preliminaries 

Following IICT06I1 we say that a matrix A satisfies a restricted isometry property (RIP) of order t with 
constant (5 > if, for all t-sparse vectors y, we have H^ylli/llylli ^ [1 — 1 + (^]. 

Suppose all columns of an A'^ x M matrix A have unit norm. Let fji — maxj^j |j4j • Aj\ be the 
coherence of A. It is folklor^ that A satisfies the RIP of order t with the constant 5 = [t — 

Suppose that the matrix A is an M x N submatrix of the N x N Fourier matrix F, with each the M 
rows of A chosen uniformly at random from the rows of F. It is immediate from the Hoeffding bound that 
if M = bfi"^ log{N/^) for some large enough constant 6 > 1 then the matrix A has coherence at most 
with probability 1 — 7. Thus, for M = Q{t^ • t log A^), A satisfies the RIP of order t with constant 6 = 0.5 
with probability 1 — 1/A^*. 

The algorithm appears in Algorithm 15.11 



5.2 Correctness of each stage of recovery 

Lemma 5.1. Consider the recovery of a column/row j in ROBUSTESTIMATECOL, where u and v are 
the results of FOLDToBins on x. Let y G denote the jth column/row ofx. Suppose y is drawn 

from a permutation invariant distribution y = y^^'^'^ + yresidue _j_ ygauss^ where mhi^^^^^^(^yhe.ad^ \yi\ > L, 
\^\^yresidue\^\^^ < eL, and y9<^^^'^ is drawn from the ^/n-dimensional normal distribution NciOjCr"^!^) with 
standard deviation a = eL/n^^^ in each coordinate on both real and imaginary axes. We do not require 
that y^^°''^, yresidue^ ygauss independent except for the permutation invariance of their sum. 
Consider the following bad events: 

^It is a direct corollary of Gershgorin's theorem applied to any t columns of A. 
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procedure RobustEstimateCol(2, v, T, T', IsCol, J, Ranks) 


it; ^ 0. 






> Set of changes, to be tested next round. 


lor J J QO 




CUllllllUc li ivallJvS 1 1 lSv_-Ui, J 11 ^ lUg lUg it. 






Pmr'f^rlnrp from IIT-TTl^PI 9qI • f^(^r\(T^ timp 
\> rlOCCUUlC iiuill ll-nilVr lZ,all . kJWkjM. lit LilllC 


^ / -^^A'-,^-^ 'T'.T , ,T% 

a ■<— median^^T^ UjU: . 




coniinuc 11 \CL\ <, Ij/a 


i> iNOuiing signincaiii rccovcrcQ 


continue ii z^^-^x \ Uj — auj > L \1 \ / iU 


> Bad recovery: probably not 1-sparse 


b meaiiT-gT- u^-uj'^'^. 




11 IsCol then 


> whether decoding column or row 


Wij <— 6. 




else 




11) A A i — h 




end if 




S ^ Su\i\ 




Ranks m — IsCol ill += Ranks TflsCol ill 




for T G r U T' do 








'^(t ) , --;(t ) I,, , — ri 

— buj 




end for 




end for 




return w, u, v, S 




end procedure 




procedure Robust2DSFFT(x, k) 




r, r c [v'^j, r = \t'\ = c(iogn) 




for r G T U T do 




^((.T^; ^ FOLDToBins(x, ^/n, 1, 0, r). 




^1.' ; ^ rOLD 1 OB INS (x, i, -y/n, T, (Jj. 




end for 




z U 




Ranks ^ li^i^^iv"-] 


> Rank of vertex (iscolumn, index) 


o , r / — 1 


"IT TI ■ 1 1 111 

> Which columns to test 


for t G [C log n\ do 




{w, n,?;, S'r-ou,} ^ RobustEstimateCol(u,€ 


,T,T , true, 5co«, Ranks). 


2" ■< — z + {(}. 




Srow ^ [Vn\ if t = 


Try every row the first time 


{w,v, u, Scoi} ^ RobustEstimateCol(i;, u, 


T, T false, Srow, Ranks). 


z z + w. 




end for 




return z 




end procedure 





Algorithm 5.1: Robust 2D sparse FFT algorithm for k = Q{^/n) 
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False negative: supp(y^'^°'^) = {i} and ROBUSTESTIMATECOL does not update coordinate i. 

False positive: ROBUSTESTIMATECOL updates some coordinate i but supp(y''*^"'^) / {i}. 

Bad update: supp(?/'*'^"'^) = {i} and coordinate i is estimated by b with \b — y^^'"^ > HyresidMeji,^ _|_ 



log log n 



eL. 



For any constant c and e below a sufficiently small constant, there exists a distribution over sets T, T' of 
size 0(log n), such that as a distribution over y and T, T' we have 

• The probability of a false negative is 1 / log'^ n. 

• The probability of a false positive is l/n'^. 

• The probability of a bad update is 1 / log'^ n. 

Proof. Let y denote the 1 -dimensional inverse DFT of y. Note that 

Uj = yr 

by definition. Therefore, the goal of ROBUSTESTIMATECOL is simply to perform reliable 1-sparse recovery 
with 0(log n) queries. Fortunately, [HIKP12all solved basically the same problem, although with more false 
positives than we want here. 

We choose T' according to the LoCATElNNER procedure from IIHIKP12ai : the set T is chosen uniformly 
at random from [^/n] . We have that 

This is exactly what the procedure HashToBins of IIHIKP12all approximates up to a small error term. 
Therefore, the same analysis goes through (Lemma 4.5 of IIHIKP12ai ) to get that HIKPLocateSignal 
returns i with 1 — 1/ log"^ n probability if \yi\ > ||y_j||2, where we define y_j := 

Defined E Cl^l^^tobethe rows of the inverse Fourier matrix indexed by T, normalized so \ Aij \ = 1. 

Then aj-^^ = {Ay)r. 
First, we prove 

^^yvesidue ^ ygaussy ^ q^^^^ 

with all but n-" probabihty. We have that E[||y9'^"'*^|||] = 2e'^L'^, so WyS'^^'^'h < 3eL with all but 
g-Q{Vn) ^ i/ii^ probability by concentration of chi-square variables. We also have that ||y'''^**'^"<^||2 < 

II yresidwe II ^ ^ ^j^ 

Next, we show 



P(yresid«e ^ ydauss^^^^ ^ 0{eLy^\) (7) 

with all but probability. We have that Ay^"-'^^^ is drawn from Nc{^, e^lF'I^x]) by the rotation invariance 
of Gaussians, so 



Py9a««^||2 < 3eLV|r| (8) 



20 



with all but e ^d^D < n probability. Furthermore, A has entries of magnitude 1 so || A?/'''^**'^"'^||2 < 

1 1 ^resiciMe 1 1 ^ ^^/|jt|" _ ^^^j^^j 

Consider the case where supp(y'^^"'^) = {i}. From Equation ^ we have 



so i is located with 1-1/ log'^ n probability by HIKPLocateSignal. 
Next, we note that for any i, as a distribution over r G [\/n\, 



(9) 



E[ 



\y-i\\2 



and so (analogously to Lemma 4.6 of ||HIKP12all . and for any i), since a = mediaiiT-gT u^pu:'^^ we have 



\a - yi\^ < ^\y-i\\l 



(10) 



with probability 1 — e^^^'^l^ = 1 — for some constant c. Hence if {i} = supp(y^'^"'^), we have 
|a — yip < O(e^L^) and therefore \a\ > L/2, passing the first check on whether i is valid. 
For the other check, we have that with I — probability 



(E 



\Uj — au 



a/2 



= \\A{y - aei)\\2 

< + y^-''^'^'^ + (yf'^"'^ - a)ei)||2 



gauss 



+ y 



< \\A{y 

< 0{eLy^\). 
where the last step uses Equation |7] This gives 



residue\ 



residue , 
Hi ~r »; 



gausi 



+ m 



E 



0{e^L^ \T\) < \T\ /lO 



so the true coordinate i passes both checks. Hence the probability of a false negative is 1/ log*^ n as desired. 

Now we bound the probability of a false positive. First consider what happens to any other coordinate 
i' ^ i when |supp(j/'^'^"'^)| = {i}. We get some estimate a' of its value. Since yl/-\/|T[ satisfies an RIP of 
order 2 and constant 1/4, by the triangle inequality and Equation |7] we have that with 1 — n^*^ probability. 



\\A{y - a'e,)h > \\A{y^""'e. - a'e,)h - W--^' + y' 



esidue\ 



>y 



head 



T\ ■ (3/4) -0(eLV|T|) 



> LV|T|/2. 

Hence the second condition will be violated, and i' will not pass. Thus if |supp(y'*'^"'^) | = 1, the probability 
of a false positive is at most n"'^. 

Next, consider what happens to the result i of HIKPLocateSignal when |supp(y'*'^°'^)| = 0. From 
Equation Q and Equation Q we have that with 1 — n^'^ probability: 

\a-yi\^ <5\\y-^\\l<0{€^L^). 
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Therefore, from Equation |6] 

|a| < \yi\ + \a-yi\ < + y^-'^^lh + |a - y^l = O(eL) < L/2 

so the first check is not passed and i is not recovered. 

Now suppose |supp(y/iead)| > 1- Lemma says that with 1 — n~'^ probabihty over the permutation, 
no (i, a) satisfies 



\\A{y 



head 



ae 



But then, from Equation [8] 



\\A{y-ae,)\\2>\\A{y 



i)\\2 



head 



< L^\T\/5. 



aei)\\2 - \\Ay 



^gauss 



> LyJ\T\ /b-0{tLy/\f 



> LV|T|/10 

so no i will pass the second check. Thus the probabihty of a false positive is 1 /rf. 
Finally, consider the probability of a bad update. We have that 



h = mean(^2/)T-a;" = y. 



head 



+ mean{Ay 



residue 



+ Ay3'''''')ru;^' 



and so 



We have that 



head 



< 



mea.n{Ay 



residue\ , ri 



+ 



mean(^y 



residue\ , ri 



< max 



{Ay 



residues 



^ 1 1 „ .residue 1 1 

^ \\y 111 



We know that Ay^"""^^^ is A/c(0, e^-L^/|T|). Hence its mean is a complex Gaussian with standard devia- 
tion eL I \f\T\ in both the real and imaginary axes. This means the probability that 



head 



> \\y 



residue i 



1 + teL/V|T 



is at most e ). Setting t = \/log log'^ n gives a 1/ log'^ n chance of a bad update, for sufficiently large 
|r|=0(logn). □ 

The following is the robust analog of Lemma [33] 

Lemma 5.2. Let y G C™ be drawn from a permutation invariant distribution with r > 2 nonzero values. 
Suppose that all the nonzero entries of y have absolute value at least L. Choose T C [m] uniformly at 
random with t := \T\ = 0(c^ log m) 

Then, the probability that there exists a y' with ||y'||o < 1 and 

\\{y-y')T\\l<eLh/n 



is at most c'i—^Y ^ whenever e < 1/^ 
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Proof. Let A = y'l/tFrx* be y1/t times the submatrix of the Fourier matrix with rows from T, so 

\\{y-y')T\\l = \\A{y-y')\\lt/n. 

By a coherence bound (see Section [5TT] ). with 1 — l/rrf probabiUty A satisfies the RIP of order 2c with 
constant 0.5. We would like to bound 

P := Pr[3y' : \\A{y - y')\\l < eh" and ||y'||o < 1] 

If r < c — 1, then y — y' is c-sparse and 

\\A{y-y')\\l>\\y-y'g/2 
> (r - 1)LV2 
>eL2 

as long as e < 1/2, giving P = 0. Henceforth, we can assume r > c. When drawing y, first place r — (c— 1) 
coordinates into u then place the other c — 1 values into v, so that y = u + v. Condition on u, so f is a 
permutation distribution over m — r + c — 1 coordinates. We would like to bound 

P = Pr[V : \\A{u + v- y')g < eL^ and ||y'||o < 1]. 

V 

Let w be any c-sparse vector such that \\A{u + Hi < eL^ (and note that if no such w exists, then since 
V — y' is c-sparse, P = 0). Then recalling that for any norm ||-||, ||a|p < 2||&|p + 2\\a + 6|p and hence 

||a + 6f > ||a||V2-||6f, 

\\A{u + v-y')g> \\A{v -y'- w)\\l/2 - \\A{u + w)g 
> \\v -y' + w\\l/A-eL'^. 

Hence 

P < Prpy : - y' + w\\l < SeL^ and ||y'||o < 1]. 

V 

Furthermore, we know that \\v — y' + wg > L^(|supp(t;) \ supp(w)| — 1). Thus if e < 1/8, 

P < Pr[|supp(f) \ supp(t(;)| < 1] 

V 

c+{m-r + c- l)c(c - l)/2 



< 



<c\— 
m 



/m— r+c— 1\ 
I c-1 / 



as desired. □ 



5.3 Overall Recovery 

Recall that we are considering the recovery of a signal x = x* + w € where x* is drawn from the 

Bernoulli model with expected k = a^/n nonzeros for a sufficiently small constant a, and w ~ A'c(0, a^ln) 
with o = eL^kjn = 0(eL/n^/^) for sufficiently small e. 



23 



It will be useful to consider a bipartite graph representation G of x*. We construct a bipartite graph with 
/n nodes on each side, where the left side corresponds to rows and the right side corresponds to columns. 
For each G supp(x*), we place an edge between left node i and right node j of weight x*(^ijy 

Our algorithm is a "peeling" procedure on this graph. It iterates over the vertices, and can with a "good 
probability" recover an edge if it is the only incident edge on a vertex. Once the algorithm recovers an edge, 
it can remove it from the graph. The algorithm will look at the column vertices, then the row vertices, then 
repeat; these are referred to as stages. Supposing that the algorithm succeeds at recovery on each vertex, 
this gives a canonical order to the removal of edges. Call this the ideal ordering. 

In the ideal ordering, an edge e is removed based on one of its incident vertices v. This happens after all 
other edges reachable from v without passing through e are removed. Define the rank of v to be the number 
of such reachable edges, and rank(e) = rank(w) + 1 (with rank(?;) undefined if v is not used for recovery of 
any edge). 

Lemma 5.3. Let c, a be arbitrary constants, and a a sufficiently small constant depending on c, a. Then 
with 1 — a probability every component in G is a tree and at most k / log'^ n edges have rank at least log log n. 

Proof. Each edge of G appears independently with probability k/n = aj ^pa. There are at most ^/n cycles 
of length t. Hence the probability that any cycle of length t exists is at most a*, so the chance any cycle 
exists is less than d} jiX — c?) < a/2 for sufficiently small a. 

Each vertex has expected degree a < 1. Exploring the component for any vertex v is then a subcritical 
branching process, so the probability that t;'s component has size at least log log n is 1/ log'^ n for sufficiently 
small a. Then for each edge, we know that removing it causes each of its two incident vertices to have 
component size less than log log n — 1 with 1 — 1/ log'^ n probability. Since the rank is one more than the 
size of one of these components, the rank is less than log log n with 1 — 2/ log"^ n probability. 

Therefore, the expected number of edges with rank at least log log n is 2A;/ log*^ n. Hence with proba- 
bility 1 — a/2 there are at most (l/a)4A;/ log'^ n such edges; adjusting c gives the result. □ 

Lemma 5.4. Let Robust2DSFFT' be a modified ROBUST2DSFFT that avoids false negatives or bad 
updates: whenever a false negative or bad update would occur, an oracle corrects the algorithm. With large 
constant probability, R0BUST2DSFFT' recovers S such that there exists a (A;/ log^ n)-sparse z! satisfying 

11^ ^ ^l|2 ^ a 2 

\\z — X — z < DO" n. 
Furthermore, only 0{k /\og^ n) false positives or bad updates are caught by the oracle. 

Proof. One can choose the random x* by first selecting the topology of the graph G, and then selecting the 
random ordering of the columns and rows of the matrix. Note that reordering the vertices only affects the 
ideal ordering by a permutation within each stage of recovery; the set of edges recovered at each stage in 
the ideal ordering depends only on the topology of G. Suppose that the choice of the topology of the graph 
satisfies the thesis of Lemma l5.3l (which occurs with large constant probability). We will show that with large 
constant probability (over the space of random permutations of the rows and columns), ROBUST2DSFFT' 
follows the ideal ordering and the requirements of Lemma ISTI are satisfied at every stage. 

For a recovered edge e, we define the "residue" x*e — Ze- We will show that if e has rank r, then 

X e- Ze S j^g^ eL. 

During attempted recovery at any vertex v during the ideal ordering (including attempts on vertices 
which do not have exactly one incident edge), let y G be the associated column/row oix — z. We split 
y into three parts y = y^'^°-'^ + yvesidue ^ ygauss^ where y'^^'"^ contains the elements of x* not in supp(z). 
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yvesidue contains X* — z over the support of z, and y9°-^'^s contains w (all restricted to the column/row 
corresponding to v). Let S = supp(y'~'^**'^"'^) contain the set of edges incident on v that have been recovered 
so far. We have by the inductive hypothesis that ||y»'es"i"e||_^ ^ ^^^gXdiVik{e)^J^ f^°^^^ eL. Since the 
algorithm verifies that ^eg5rank(e) < log log n, we have 



\y 



residue \ 



< 



' log log n 
log n 



eL < eL. 



Furthermore, y is permutation invariant: if we condition on the values and permute the rows and columns 
of the matrix, the algorithm will consider the permuted y in the same stage of the algorithm. 

Therefore the conditions for Lemma ISTI hold. This means that the chance of a false positive is l/n*^, 
so by a union bound this never occurs. Because false negatives never occur by assumption, this means we 
continue following the ideal ordering. Because bad updates never occur, new residuals have magnitude at 
most 



Because ||y^es4d«e||_^ 
magnitude at most 



log log n T 
log n 



\y 



residue i 



1 + 



' log log n 
log n 



eL. 



— J2eeS^^^^i^) ~ rank(ti) = rank(e) — 1, each new residual has 



rank(e) 



' log log n 
log n 



eL < eL. 



(11) 



as needed to complete the induction. 

Given that we follow the ideal ordering, we recover every edge of rank at most log log n. Furthermore, 
the residue on every edge we recover is at most eL. By Lemma [531 there are at most k/log^ n edges that we 
do not recover. From Equation (ITTIi. the squared £2 norm of the residues is at most e^L'^k = e^C'^a'^n/k-k < 
a'^n for e small enough. Since \\w\\2 < 2a'^n with overwhelming probability, there exists a so that 



X 



z 



< 2\\z 



z'\\2 + 2||w||2 < 6(T^n. 



Finally, we need to bound the number of times the oracle catches false positives or bad updates. The 
algorithm applies Lemma |5?T] only 2^/n + 0{k) = 0{k) times. Each time has a 1/ log'^ n chance of a false 
positive or bad update. Hence the expected number of false positives or bad updates is 0(A;/ log"^ n). □ 

Lemma 5.5. For any constant a > 0, the algorithm R0BUST2DSFFT can with probability 1 — a recover 



z such that there exists a (fc/ log 



c-l 



n 



-sparse z' satisfying 



z 9 < ba n 



using 0{k log n) samples and 0{k log^ n) time. 

Proof. To do this, we will show that changing the effect of a single call to RobustEstimateCol can 
only affect logn positions in the output of ROBUST2DSFFT. By Lemma \5A\ we can, with large constant 
probability turn ROBUST2DSFFT into R0BUST2DSFFT' with only 0(A;/ log'' n) changes to calls to Ro- 
bustEstimateCol. This means the output of Robust2DSFFT and of Robust2DSFFT' only differ in 
0(A:/ log'^"^ n) positions. 
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We view RobustEstimateCol as trying to estimate a vertex. Modifying it can change from recover- 
ing one edge (or none) to recovering a different edge (or none). Thus, a change can only affect at most two 
calls to RobustEstimateCol in the next stage. Hence in r stages, at most 2^'^^ calls may be affected, so 
at most 2^' edges may be recovered differently. 

Because we refuse to recover any edge with rank at least log log n, the algorithm has at most log log n 
stages. Hence at most log n edges may be recovered differently as a result of a single change to RobustEs- 
timateCol. □ 

Theorem 5.6. Our overall algorithm can recover x' satisfying 

\\x - x'Wl < 12cr^n + llxlla/n'' 

with probability 1 — a for any constants c, a > in 0{klogn) samples and 0{k\o^ n) time, where 
k = a^/nfor some constant a > 0. 

Proof. By Lemma [531 we can recover an 0(fc)-sparse z such that there exists an (fc/log'^^^ n)-sparse 2' 
with 

11^ ^ ^Il2 ^ a 2 

\\x — z — z < DO" n. 

with arbitrarily large constant probability for any constant c using 0{k log^ n) time and 0{k log n) samples. 
Then by Theorem IB. II in Appendix |Bj we can recover az' in 0{k log^ n) time and 0{k log^~'^ n) samples 
satisfying 

and hence x' := z + z' is a good reconstruction for x. □ 
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A Sample lower bound for our distribution 

We will show that the lower bound on £2/^2 recovery from UPWl 11 applies to our setting with a simple 
reduction. First, we state their bound: 

Lemma A.l ( BPWlli section 4). For any k < n/logn and constant e > 0, there exists a distribution 
Dk over k-sparse vectors in {0,1,-1}" such that, for every distribution of matrices A G M™-^" with 
m = o{k log{n/k)) and recovery algorithms A, 

Pt:[\\A{A{x + w)) - x\\2 < Vk/5] < 1/2 

as a distribution over x ~ Dj. and w ~ -/V(0, cr'^In) with o"^ = ek/n, as well as over A and A. 

First, we note that we can replace D}^ with U^, the uniform distribution over fc-spai^se vectors in {0, 1,-1} 
in Lemma IaTTI To see this, suppose we have an {A, A) that works with 1/2 probability over Uk- Then 
for any A;-sparse x G {0, 1, —1}", if we choose a random permutation matrix P and sign flip matrix S, 
PSx ~ Uk- Hence, the distribution of matrices APS and algorithm A!{x) = A{{PS)~^x) works with 1/2 
probability for any x, and therefore on average over Dk- This implies that A has U.{k\og{n / k)) rows by 
Lemma lAin Hence, we can set Dk = Uk in Lemma IATTI 

Our algorithm works with 3/4 probability over vectors x that are not necessarily A;-sparse, but have 
a binomial number B{n, k/n) of nonzeros. That is, it works over the distribution U that is Uk' ■ k' ~ 
B{n,k/n). With 1 - e'^^'') > 3/4 probability, k' G [k/2,2k]. Hence, our algorithm works with at least 
1/2 probability over {Uk' : k' ~ B{n, k/n) nk' £ [k/2, 2k]). By an averaging argument, there must exist a 
k' G [k/2, 2k] where our algorithm works with at least 1/2 probability over Uk' ; but the lemma implies that 
it must therefore take Q{k' log{n/k')) = Q{klog{n/k)) samples. 

B Robust 2D FFTs 

This section outlines the straightforward generalization of IIHIKP12all to two dimensions, as well as how to 
incorporate the extra parameter z of already recovered coefficients. Relative to our result of Theorem 15.61 



28 



this result takes more samples. However, it does not require that the input be from a random distribution and 
is used as a subroutine by Theorem 15.61 after decreasing the sparsity by a log^ n factor. 

Because we use this as a subroutine after computing an estimate z of x, we actually want to estimate 
X — z where we have oracle access to x and to z. 

Theorem B.l. There is a variant of hHlKP12a\l algorithm that will, given x, z G d^v^xv^^ return x' with 

\\x — z — x'\\2 < 2 ■ min__^||x — z — x*!!! + ||S||2/'^'^ 

k-sparse x* 

with probability 1 — a for any constants c, a > in time 

0{k \og{n/k) log^ n + |supp(z)| log(n/A:) log n), 
using 0{k \og{n/k) log^ n) samples of x. 

Proof. We need to modify IIHIKP12all in two ways: by extending it to two dimensions and by allowing the 
parameter z. We will start by describing the adaptation to two dimensions. 

The basic idea of IIHIKP12all is to construct from Fourier measurements a way to "hash" the coordinates 
m B = 0{k) bins. There are three basic components that are needed: a permutation that gives nearly 
pairwise independent hashing to bins; a filter that allows for computing the sum of bins using Fourier 
measurements; and the location estimation needs to search in both axes. The permutation is the main 
subtlety. 

Permutation Let M C [\/n]^^^ be the set of matrices with odd determinant. For notational purposes, for 
V = we define x^ := Xi^j. 

Definition B.2. For M e M and a,b e [^n]^ we define the permutation PM,a,bC'^'''^ ^V^xV^ hy 

We also define iTM,bi'v) = M{v — b) mod ^/n. 
Claim B.3. P^x^^^^ ^(^^ = x,io^^^'^^ 



Proof. 



^ «e[v/;j]2 



Mb 



Me[v/ri]2 



where we used that M'^ is a bijection over [^/n\'^ because det(M) is odd. □ 
This gives a lemma analogous to Lemma 2.4 of IIHIKP12all . 
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Lemma B.4. Suppose v G [y^^ '■^ ""f 0. Then 

Pr [Mv e[-C,C]^ (mod^/^)]<0( — ). 
M^M 11 

Proof. For any u, define G{u) to be the largest power of 2 that divides both uq and ui. Define g = G{v), 
and let S = {u & [V^^ I G{u) = g}. We have that Mv is uniform over S": is a group and S is the orbit 
of (0,5). 

Because S lies on a lattice of distance g and does not include the origin, there are at most (2 \ C /g\ + 
if - I < ^(C/g)"^ elements in S n [-C, Cf, and (3/4)n/5^ total elements in S. Hence the probabihty is 
almost (32/3) CVra. □ 

We can then define the "hash function" hM,b '■ [\/"]^ — ^ [V^]^ given by {hM,b{u)) = round(7rAf,fe(n) • 
y/n/B); i.e., round to the nearest multiple of s/n/B in each coordinate and scale down. We also define 
the "offset" OM,biu) = T^Mfiiu) — BhM,b{u). This lets us give results analogous to Claims 3.1 and 3.2 
of IIHIKP12all : 

• PT[hM,b{u) = hM,b{v) < 0{1/B)] for u V. In order for h{u) = h{v), we need that TTM,b{u) — 
T^Mfiiv) e [-2y/n/B,2y/n/B]'^. But Lemma El implies this probability is 0{l/B). 

• Pr[ojv/,{,(u) ^ [— (1 — q) Y^n/S, (1 — a) Y^n/i?]^] < 0(a)foranya > 0. Because of the offset 6, o^f, 6 ("u) 
is uniform over [—y^n/B, ^njB^ . Hence the probability is 2a — c? ^ o(l) by a volume argument. 

which are all we need of the hash function. 

Filter Modifying the filter is pretty simple. Specifically. HHIKP 1 2all defined a filter G G with support 
size 0(\/Slogn) such that G is essentially zero outsize \— ^Jt}JB ^ ^JrifB\ and is essentially 1 inside 
[—(1 — a)^^n/B^ (1 — a)^Jn/B\ for constant a. We compute the \fB x \fB 2-dimensional DFT of 
x'i j = XijGiGj to sum up the element in each bin. This takes Blog^n samples and time rather than 
B log n, which is the reason for the extra log n factor compared to the one dimensional case. 

Location Location is easy to modify; we simply run it twice to find the row and column separately. 

In summary, the aforementioned adaptations leads to a variant of the BHIKPllaP algorithm that works 
in two dimensions, with running time 0(fclog(n/fc) log^ n), using 0{klog{n/k) log^ n) samples. 

Adding extra coefficient list 2 The modification of the algorithm of IIHIKP12all (as well as its variant 
above) is straightforward. The algorithm performs a sequence of iterations, where each iteration involves 
hashing the frequencies of the signal into bins, followed by subtracting the already recovered coefficients 
from the bins. Since the algorithm recovers @{k) coefficients in the first iteration, the subtracted list is 
always of size Q{k). 

Given the extra coefficient list, the only modification to the algorithm is that the list of the subtracted 
coefficients needs to be appended with coefficients in z. Since this step does not affect the samples taken 
by the algorithm, the sample bound remains unchanged. To analyze the running time, let k' be the number 
of nonzero coefficients in £ Observe that the total time of the original algorithm spent on subtracting the 
coefficients from a list of size Q{k) was 0(/clog(n//i;) log n), or 0(log(n/A;) log n) per list coefficient. 
Since in our case the number of coefficients in the list is increased from ©{k) to k' + Q{k), the running time 
is increased by an additive factor of 0{k' log{n/k) log n). □ 
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